Energy-efficient Fine-grained Many-core Architecture for Video and DSP Applications
نویسنده
چکیده
Many-core processor architecture has become the most promising computer architecture. However, how to utilize the extra system performance for real applications such as video encoding is still challenging. This dissertation investigates architecture design, physical implementation and performance evaluation of a fine-grained many-core processor for advanced video coding with a focus on interconnection, topology, memory system and related parallel programming methodology. A baseline residual encoder for H.264/AVC on a current generation fine-grained manycore system is proposed that utilizes no application-specific hardware. The 25-processor encoder encodes video sequences with variable frame sizes and can encode 1080p HDTV at 30 frames per second with 293 mW average power consumption by adjusting each processor to workload-based optimal clock frequencies and dual supply voltages—a 38.4% power reduction compared to operation with only one clock frequency and supply voltage. In comparison to published implementations on the TI C642 DSP platform, the design has approximately 2.9–3.7 times higher scaled throughput, 11.2–15.0 times higher throughput per chip area, and 4.5–5.8 times lower energy per pixel. Compared to a heterogeneous SIMD architecture customized for H.264, the presented design has 2.8–3.6 times greater throughput, 4.5–5.9 times higher area efficiency, and similar energy efficiency. Next, this dissertation proposes novel processor shapes and inter-connection topologies for many-core processor arrays which result in an overall application processor that requires fewer cores and has a lower total communication length. The proposed topologies compared to the commonly-used 2D mesh and include two 8-neighbor topologies, two 5nearest-neighbor and three 6-nearest-neighbor topologies—three of which utilize 5-sided or hexagonal processor tiles. A 1080p H.264/AVC residual video encoder and a complete 54 Mbps 802.11a/11g wireless LAN baseband receiver are mapped onto all topologies and compared. The methodology to implement an array of hexagonal-shaped processor tiles with industry-standard CAD tools and automatic place and route flow is described. A 16-
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملAn Energy-efficient Parallel H.264/AVC Baseline Encoder on a Fine-grained Many-core System
The emerging many-core architecture provides a flexible solution for the rapid evolving multimedia applications demanding both high performance and high energy-efficiency. However, developing parallel multimedia applications that can efficiently harness and utilize manycore architectures is the key challenge for scalable computing. We contribute to this challenge by presenting a fully-parallel ...
متن کاملEnergy-Efficient String Search Architectures on a Fine-Grained Many-Core Platform
This paper presents three energy-efficient methods for searching and filtering streamed data on a fine-grained manycore processor array: parallel, serial, and all-in-one. All three architectures aim to provide programmable flexibility with low energy consumption. Experimental results show that for one keyword search, the parallel and serial architectures consume 2× less energy per workload than...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملAn Energy Efficient Real-Time Object Recognition Processor with Neuro-Fuzzy Controlled Workload-aware Task Pipelining
An energy efficient pipelined architecture is proposed for multi-core object recognition processor. The proposed neuro-fuzzy controller and intelligent estimation of the workload of input video stream enable seamless pipelined operation of the 3 object recognition tasks. The neuro-fuzzy controller extracts the fine-grained region-of-interest, and its task pipelining achieves 60.6fps, 5.8x highe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012